feat!: support v0.11.1#112

Merged

ILikeIneine merged 32 commits intomasterfrom

support-vllm-0.11.1

Nov 14, 2025

Member

ILikeIneine commented Oct 27, 2025

Purpose

This PR is for supporting vllm v0.11.1

Test Plan

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

ILikeIneine self-assigned this

ILikeIneine marked this pull request as draft

October 27, 2025 07:52

gemini-code-assist Bot reviewed

View reviewed changes

Contributor

gemini-code-assist Bot left a comment

Code Review

This pull request updates the codebase to support vllm v0.11.1, which involves significant refactoring around memory allocation, platform integration, and attention mechanisms. The changes appear to align with the goal of supporting the new vllm version. I have found one critical issue in the device allocator patch that could lead to a runtime error and have provided a fix.

vllm_metax/patch/device_allocator/device_allocator.py

Comment on lines +48 to +53

+                  if len(self._sleep_saved_buffers):
+                      model = self.model_runner.model
+                      for name, buffer in model.named_buffers():
+                          if name in self._sleep_saved_buffers:
+                              buffer.data.copy_(self._sleep_saved_buffers[name].data)
+                      self._sleep_saved_buffers = {}

Contributor

gemini-code-assist Bot Oct 27, 2025

There is a potential AttributeError here. The self._sleep_saved_buffers attribute is only initialized within the sleep method, and only when level == 2. If wake_up is called after sleep(level=1) or before any call to sleep, self._sleep_saved_buffers will not exist on the object, causing a crash when len() is called on it.

To prevent this, you should safely check for the attribute's existence before trying to access it.

Suggested change

      
                if len(self._sleep_saved_buffers):
          
                    model = self.model_runner.model
          
                    for name, buffer in model.named_buffers():
          
                        if name in self._sleep_saved_buffers:
          
                            buffer.data.copy_(self._sleep_saved_buffers[name].data)
          
                    self._sleep_saved_buffers = {}
          
                if hasattr(self, "_sleep_saved_buffers") and self._sleep_saved_buffers:
          
                    model = self.model_runner.model
          
                    for name, buffer in model.named_buffers():
          
                        if name in self._sleep_saved_buffers:
          
                            buffer.data.copy_(self._sleep_saved_buffers[name].data)
          
                    self._sleep_saved_buffers = {}

ILikeIneine force-pushed the support-vllm-0.11.1 branch from a601543 to 1f48880 Compare

October 29, 2025 03:24

leex404 force-pushed the support-vllm-0.11.1 branch 2 times, most recently from ab31312 to f516af8 Compare

November 4, 2025 08:35

ILikeIneine and others added 18 commits

November 7, 2025 14:09


          support platform and remove kernel copy

856b728

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update pre-commit

a52233f

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update version and requirements

9e86f3e

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update flashinfer

6b0b8e6

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update build requirements

9ec7b23

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update attention backends

ec972a6

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update patch

633ff80

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update quant_method

19c876b

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update fuse_moe (todo: fix mypy)

53017fa

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update deepseek_v2.py(todo: fix indexer kernel)

2a3936c

Signed-off-by: Hank <hcc.mayday@gmail.com>


          [feat] support bf16 cp_gather_indexer_k_cache kernel

fbf5235

Signed-off-by: Xin Li <lixin1620@gmail.com>


          [fix] fix type error in bf16_paged_mqa_logits

13a6e97

Signed-off-by: leex404 <lixin1620@gmail.com>


          [feat] add topk logits ops

7dc236d

Signed-off-by: leex404 <lixin1620@gmail.com>


          [fix] private memory size too large in `sample_recovered_tokens_kerne…

63aa6da

…l` (#115)

* [fix] fix sample_recovered_tokens_kernel use too much private memory

Signed-off-by: Xin Li <xin.li@metax-tech.com>

* [fix] fix type error in bf16_paged_mqa_logits

Signed-off-by: Xin Li <xin.li@metax-tech.com>

* [chore] change file directory

Signed-off-by: Xin Li <xin.li@metax-tech.com>

---------

Signed-off-by: Xin Li <xin.li@metax-tech.com>
Co-authored-by: Xin Li <xin.li@metax-tech.com>

Signed-off-by: leex404 <lixin1620@gmail.com>


          [fix] fix missing topk logits custom ops definition

1d9c4d4

Signed-off-by: leex404 <lixin1620@gmail.com>


          [fix] add custom gptq_shuffle ops

0a459f2

Signed-off-by: leex404 <lixin1620@gmail.com>


          [fix] fix compile error

3a2cfb0

Signed-off-by: leex404 <lixin1620@gmail.com>


          platform config update

32d2d83

Signed-off-by: Hank <hcc.mayday@gmail.com>

ILikeIneine force-pushed the support-vllm-0.11.1 branch from de238f9 to 32d2d83 Compare

November 7, 2025 06:11

ILikeIneine and others added 5 commits

November 7, 2025 18:50


          update qwen2.5_vl model

34c03c6

Signed-off-by: Hank <hcc.mayday@gmail.com>


          [fix] fix torch not found maca device

c9bd90a

Signed-off-by: leex404 <lixin1620@gmail.com>


          remove hotfixes patch for torch2.8

47baaef

Signed-off-by: Hank <hcc.mayday@gmail.com>


          remove needless patch

bbcc778

related: vllm-project/vllm/pull/27322

Signed-off-by: Hank <hcc.mayday@gmail.com>


          [feat] topk_softmax support renormalize and bf16

6ecac1e

Signed-off-by: leex404 <lixin1620@gmail.com>

leex404 and others added 9 commits

November 12, 2025 10:58


          [fix] update fused_moe to fit v0.11.1

5317d66

Signed-off-by: leex404 <lixin1620@gmail.com>


          [fix] fix fused moe config log missing

b870702

Signed-off-by: leex404 <lixin1620@gmail.com>


          use flash_attn as vit attn backend on qwen_vl

dc0fad9

Signed-off-by: Hank <hcc.mayday@gmail.com>


          update quant_conf registry

678dd1a

Signed-off-by: Hank <hcc.mayday@gmail.com>


          fix and apply latest pre-commit of v0.11.1

e6ddd33

Signed-off-by: Hank <hcc.mayday@gmail.com>


          [feat] Keep all AITER kernels in _aiter_ops

be4945a

Signed-off-by: leex404 <lixin1620@gmail.com>


          fix pre-commit on type casting

a6d8b1f

Signed-off-by: Hank <hcc.mayday@gmail.com>


          [fix] fix DeepSeek import error

Signed-off-by: leex404 <lixin1620@gmail.com>


          [feat] update deepseek_v2 to fit v0.11.1

b557d27

Signed-off-by: leex404 <lixin1620@gmail.com>

ILikeIneine marked this pull request as ready for review

November 14, 2025 02:09

ILikeIneine merged commit 0a392da into master

2 of 4 checks passed

ILikeIneine changed the title ~~[WIP] support v0.11.1~~ feat!: support v0.11.1

ILikeIneine added the v0.11.1 label

ILikeIneine deleted the support-vllm-0.11.1 branch

November 24, 2025 07:26

ILikeIneine restored the support-vllm-0.11.1 branch

November 24, 2025 07:26

RedWhiteCATT mentioned this pull request

[Bug]: mineru模型报错 #155

Closed

ILikeIneine deleted the support-vllm-0.11.1 branch

December 1, 2025 10:32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels